Search | WHO COVID-19 Research Database

Improving Text Clustering Using a New Technique for Selecting Trustworthy Content in Social Networks

Diaz-Garcia, J. A.; Fernandez-Basso, C.; Gutiérrez-Batista, K.; Ruiz, M. D.; Martin-Bautista, M. J..

19th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2022 ; 1602 CCIS:275-287, 2022.

Article in English | Scopus | ID: covidwho-1971509

ABSTRACT

Today’s information society has led to the emergence of a large number of applications that generate and consume digital data. Many of these applications are based on social networks, and therefore their information often comes in the form of unstructured text. This text from social media also tends to contain a high level of noise and untrustworthy content. Therefore, having systems capable of dealing with it efficiently is a very relevant issue. In order to verify the trustworthiness of the social media content, it is necessary to analyse and explore social media data by using text mining techniques. One of the most widespread techniques in the field of text mining is text clustering, that allows us to automatically group similar documents into categories. Text clustering is very sensitive to the presence of noise and so in this paper we propose a pre-processing pipeline based on word embedding that allows selecting trustworthy content and discarding noise in a way that improves clustering results. To validate the proposed pipeline, a real use case is provided on a Twitter dataset related to COVID-19. © 2022, Springer Nature Switzerland AG.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL